Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

نویسندگان

چکیده

Visual attention in Question Answering (VQA) targets at locating the right image regions regarding answer prediction, offering a powerful technique to promote multi-modal understanding. However, recent studies have pointed out that highlighted from visual are often irrelevant given question and answer, leading model confusion for correct reasoning. To tackle this problem, existing methods mostly resort aligning weights with human attentions. Nevertheless, gathering such data is laborious expensive, making it burdensome adapt well-developed models across datasets. address issue, article, we devise novel regularization approach, namely, AttReg, better grounding VQA. Specifically, AttReg first identifies essential answering yet unexpectedly ignored (i.e., assigned low weights) by backbone model. And then mask-guided learning scheme leveraged regularize focus more on these key regions. The proposed method very flexible model-agnostic, which can be integrated into most attention-based VQA require no supervision. Extensive experiments over three benchmark datasets, i.e., VQA-CP v2, v1, been conducted evaluate effectiveness of AttReg. As by-product, when incorporating strong baseline LMH, our approach achieve new state-of-the-art accuracy 60.00% an absolute performance gain 7.01% v2 dataset. In addition validation, recognize faithfulness has not well explored literature. light this, propose empirically validate property compare prevalent gradient-based approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual Pattern Image Coding by a Morphological Approach (RESEARCH NOTE)

This paper presents an improvement of the Visual Pattern image coding (VPIC) scheme presented by Chen and Bovik in [2] and [3]. The patterns in this improved scheme are defined by morphological operations and classified by absolute error minimization. The improved scheme identifies more uniform blocks and reduces the noise effect. Therefore, it improves the compression ratio and image quality i...

متن کامل

A geometric approach for color image regularization

We present a new vectorial total variation method that addresses the problem of color consistent image filtering. Our approach is inspired from the double-opponent cell representation in the human visual cortex. Existing methods of vectorial total variation regularizers have insufficient (or no) coupling between the color channels and thus may introduce color artifacts. We address this problem ...

متن کامل

A Geometric Approach to Color Image Regularization

We present a new vectorial total variation method that addresses the problem of color consistent image filtering. Our approach is inspired from the double-opponent cell representation in the human visual cortex. Existing methods of vectorial total variation regularizers have insufficient (or no) coupling between the color channels and thus may introduce color artifacts. We address this problem ...

متن کامل

Learning to Answer Questions from Image Using Convolutional Neural Network

In this paper, we propose to employ the convolutional neural network (CNN) for learning to answer questions from the image. Our proposed CNN provides an endto-end framework for learning not only the image representation, the composition model for question, but also the intermodal interaction between the image and question, for the generation of answer. More specifically, the proposed model cons...

متن کامل

Image retrieval using visual attention

Author: Liam M. Mayron Title: Image retrieval using visual attention Institution: Florida Atlantic University Dissertation Advisor: Dr. Oge Marques Degree: Doctor of Philosophy Year: 2008 The retrieval of digital images is hindered by the semantic gap. The semantic gap is the disparity between a user’s high-level interpretation of an image and the information that can be extracted from an image...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications

سال: 2022

ISSN: ['1551-6857', '1551-6865']

DOI: https://doi.org/10.1145/3498340